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ABSTRACT 

Polbase (http://polbase.neb.com) is a freely access- 
ible database of DNA polymerases and related ref- 
erences. It has been developed in a collaborative 
model with experts whose contributions reflect their 
varied backgrounds in genetics, structural biology 
and biochemistry. Polbase is designed to compile 
detailed results of polymerase experimentation, pre- 
senting them in a dynamic view to inform further 
research. After validation, results from references 
are displayed in context with relevant experimental 
details and are always traceable to their source pub- 
lication. Polbase is connected to other resources, 
including PubMed, UniProt and the RCSB Protein 
Data Bank, to provide multi-faceted views of poly- 
merase knowledge. In addition to a simple web 
interface, Polbase data is exposed for custom ana- 
lysis by external software. With the contributions 
of many polymerase investigators, Polbase has 
become a powerful research tool covering most 
important aspects of polymerases, from sequence 
and structure to biochemistry. 

INTRODUCTION 

DNA Polymerases are responsible for faithful replication 
of DNA and maintenance of genomic integrity via repair 
and recombination. These enzymes catalyze the addition 
of free nucleotides to the 3'-end of a growing deoxyribo- 
nucleic acid polymer. The polymerase forms a complex 
with the template strand, priming strand and incoming 
nucleotide to catalyze the addition of a specific dNTP to 
the priming strand. 

Since the discovery of the first DNA polymerase by 
Arthur Kornberg 60 years ago (1), a variety of specialized 
polymerase types with roles in the many aspects of genome 
replication and maintenance have been revealed (2). 

Polymerase nomenclature has evolved over time to re- 
flect the changing polymerase landscape (3,4). The current 



system (4) categorizes polymerases into seven families 
based on sequence similarity and biological roles. 
Recently, exploration of repair and /ra«.v-lesion polymer- 
ases in famihes X and Y (5,6) have added nuance to the 
balance between faithful replication and cell survival. 

In addition to their biological functions, polymerases 
have proven to be pillars of biotechnology. The discovery 
of thermostable polymerases (7) enabled the Polymerase 
Chain Reaction (PGR), forever changing molecular biology 
(8). Specialized polymerases are at the core of the current 
generation of DNA sequencing platforms and a growing 
variety of diagnostic and detection technologies. 

Polymerases play a role in various diseases, including 
many genetic disorders, viral infections and cancers. Thus, 
understanding basic properties of these enzymes has been 
crucial to diagnosis and treatment. Despite their central 
biological function and importance to biotechnology, no 
single repository of polymerase information existed before 
Polbase. 

POLBASE IMPLEMENTATION 

Polbase is a collaborative, open database focused exclu- 
sively on DNA polymerases. It is intended to provide a 
unified information resource for both polymerase experts 
and those just entering the field. Polbase does not attempt 
to replace existing protein and genetic information re- 
sources. Instead, Polbase compiles the information avail- 
able in external resources, extending it with results 
extracted from the primary literature and polymerase- 
specific features. Since authors are in the best position to 
enter the important results from their publications quickly 
and accurately, we ask them to perform this critical 
function. Polbase was begun with the contribution of ref- 
erences and curation efforts from a small number of 
founding collaborators. More recently the larger polymer- 
ase community has been engaged to finish the work of 
cataloging the wealth of information in this mature field. 
By spreading the effort of maintaining Polbase through- 
out the polymerase community, minimal effort is required 
of any single research group. 
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New polymerase references are discovered and added to 
Polbase via automated tools or manual submission. As 
new papers are added, the corresponding author is con- 
tacted by email and asked to complete and vahdate the 
Polbase representation of their work. 

Each reference follows a linear path as it is imported 
into Polbase. After a reference is created, its topics are 
specified by picking from a short list of polymerase 
relevant topics. Polymerases are then added to the refer- 
ence's entry. If a new polymerase is encountered, a 
place-holder polymerase entry is created. After all poly- 
merases and mutants are listed, the paper's primary results 
and the relevant experimental conditions are added by the 
contributor and indexed for searching. When all the 
results from a paper have been entered, an author 'vahd- 
ates' Polbase's representation of their work. Users may 
track the status of each reference in a queue on their 
personal account page. Polbase captures increasing 
amounts of information as the paper progresses through 
this pathway. 

Topic and polymerase specifications allow Polbase to 
expand PubMed's author, abstract and title searching to 
include not only reference searches by polymerase (includ- 
ing ahases), polymerase family and host organism, but 
also by polymerase-specific topics. Topics such as 
'Nucleotide Substitution' and 'Kinetic Parameters' improve 
searches for references and primary data (Table 1) and 
simphfy results entry. Polbase also extracts polymerase 
features from the RCSB Protein Data Bank (PDB) (9) 
allowing users to find structures by polymerase, family 
and presence or absence of DNA in the protein structure. 

Relevant experimental details from each paper are 
stored and displayed with results to allow users to assess 
information in context. Available contextual information 
varies by result type, and includes details such as salt con- 
centration, presence of accessory factors, reactants, 
experimental technique used, etc. This system can be 
readily extended to accommodate new contexts as they 
arise or increase in importance to the user community. 
All results are also hnked to their source publication so 
they can be easily found in their original context. 



Polbase avoids editorial positions on the quality of any 
given result and does not present averaged values. Instead, 
summarized results are presented as ranges and link to 
individual results, which are presented with their context 
allowing the user to assess which are most relevant. 
Authenticated users and authors of primary data are 
encouraged to mark references as completely and/or cor- 
rectly represented in Polbase to faciUtate such assessment. 

References are not limited to journal articles; negative 
results and other data not suitable for traditional publica- 
tion can be made publicly available in Polbase with ap- 
propriate indications about the source of this information. 

Polbase is built on a carefully designed table structure in 
a proven relational database system (10) so that the 
compiled information will be available to future applica- 
tions in addition to its current web interface. 

INFORMATION SOURCES 

Polbase is tightly integrated with existing databases to 
provide a polymerase-centric perspective without unneces- 
sary duplication of effort. All reference entries are linked 
to PubMed entries where they exist. New references in 
PubMed are discovered, imported and associated with 
polymerases by semi-automated tools. If new publications 
contain a corresponding author's email address or have an 
author with an active Polbase account, the paper is added 
to the author's 'queue'. Authors receive notification of 
new references according to their contact preferences. 

Polbase is updated daily with all new reports of poly- 
merase structures in the PDB (E.C.# 2.7.7.7 or 2.7.7.49). 
These structures are linked with existing polymerase 
entries where possible. Polymerase entries are hnked 
with UniProt (11). Host organisms are hnked with the 
NCBI Taxonomy tree (12). 

WEBSITE ORGANIZATION 

Browsing 

Polymerases, references, structures and authors are avail- 
able in list form via hnks on the main navigation bar at the 



Table 1. Advantages of polymerase-specific search features 



Compared to: 



External search terms: 



Polbase search terms: 



Example searches possible in Polbase 



Reference searching at 
PubMed 



Title, Abstract, Author 
names, MeSH terms 



Structure searching at 
PDB 



Name, Ligands, PDB ID 



Title, Abstract, Disambiguated 
Authors, Polymerase topics, 
Polymerase name, Polymerase 
family, Polymerase relationships, 
Organism, Polymerase property . 



Polymerase, ±DNA in crystal, PDB 
ID, ± Mutants, Polymerase 
family . . . 



Reference covering family B 

polymerases and fidelity 
All references by polymerase author 

Shonen Yoshida (excluding another 

researcher of the same name) 
All of Arthur Kornberg's publications 

on DNA polymerase kinetics 
Summarized view of exonuclease 

activities of wild type and mutant 

phage T4 DNA polymerases 
All structures of family B polymerases 

with DNA in the crystal 
All structures of wild type and mutant 

phage RB69 DNA polymerases 
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left of every page (Figure 1). These indices can be filtered 
and sorted to quickly find a specific item of interest. Any 
list view can be sorted by column with a click on its 
header. 

The polymerase index page (Figure 1) includes polymer- 
ase name and family with a summary of selected 
properties. Additional tabs provide structure information 
and connectivity with related polymerases (mutants, 
isoforms, etc.). The reference index presents high-level 
summary information about publications in Polbase 
including the number of references by year, polymerase 
family and journal. A browsable index of references in 
Polbase is also available. It allows searching and sorting 
by authors, titles and dates of publication. The authors 
index includes author's names and a count of how many 
of their publications are listed in Polbase. The structure 
index catalogs all polymerases and mutants with structure 
information. It includes a field indicating the presence or 
absence of DNA in the structure, the structure title and 
the polymerase it is hnked to. 

Sorting and filtering features on the top-level index 
pages facihtate exploration of Polbase content and loca- 
tion of polymerases having shared properties (Figure IE 
and H). 



Individual Pages 

Individual pages are available for most categories of in- 
formation stored in Polbase, including polymerases, refer- 
ences, structures, authors, etc. 

A Polymerase Page (Figure 2) contains information 
about relationships with other polymerases (mutants, di- 
gestions, etc.), an interactive map of known mutations 
(with each dot associated with a mutant and hnking to 
its own record and results), a list of relevant references, 
PDB structure information (if available), the host organ- 
ism name, and a summary of results Hnked to the 
polymerase. 

Each Reference Page indicates which polymerases (wild- 
type or mutant) are covered in the reference, the citation 
details including an abstract, Polbase import pipeline 
stage, results hnks (for each polymerase) and links to 
this paper in both PubMed and at the publisher's 
website (if available). 

Each Organism Page displays the list of organisms and 
their kingdoms with the numbers of known polymerases 
(including mutants). 

Structure Pages include PDBsum (13) and Protein Data 
Bank entry links, template and/or primer DNA informa- 
tion, and a Polbase polymerase entry hnk. 




7 




Polbase 
Polymerases 

Structures 
References 

Authors 

f' Search PolbasiO') 



FAQs 

About 
Polbase 



Account 



Contact 



B 

5\ 



Feedback 



\ 




Polymerases 

Log in to add polymerases 



Wild type only 0 With structures only □ 
Search this table: 



Name 



Human Pol 
gamma 



^ Family 



5-3' exo 3-5' exo 
activity activity ' 



G 



H 



No f21 



No f4) 



Yes (91 



Yes (91 



Yes 110) Yes f81 



Yes (81 



General 
error rate 

1.8e-06- 
2.9e-04 
errors/bp (21 

2.0e-08 
errors/bp (11 

7.1e-05- 
1.4e-03 
errors/bp (51 
3.08-06 
errors/bp (11 



F 

Frameshift 
rate 


■ ^ help with this table 

Substitution More 
rate results 


1.2e-06 
errors/bp (11 


l.Oe-05-l.Oe- 


Results 


05 errors/bp 
Ol 


3.0e-07-1.2e- 


1.0e-08-2.0e- 


140 

Results 


06 errors/bp 


04 errors/bp 
iSl 




9.0e-06 
errors/bp (11 


158 

Results 







Human Pol 
eosilon 


B 


No (3) 


Yes (6^ 









No (31 



Yes (61 



2.8e-06- 
6.5e-0S 



4.5e-05-3.4e- 
03 errors/bp 



2.4e-05-5.8e- 
05 errors/bp 



43 

Results 
26 

Results 
44 

RpCLilt-c 



Showing 1 to 174 of 174 entries 



Log In to add polymerases 

Figure 1. Polbase Navigation, Polymerase Index Page. A variety of features are available throughout Polbase, including (A) Links to Polbase 
features, (B) Polbase search features, (C) page-specific help, (D) tabs to categorize information, (E) selective filter tool to display only rows matching 
a search term, (F) more details about how to use Polbase tables, (G) link to a user account page with publications, polymerases, searches, etc., 
(H) sortable column titles, (I) numbers to indicate how many results contribute to summary values, (J) feedback link to aid community driven 
development. 
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T4 

(phage T4, bacteriophage polymerase T4) 



Mutation Position 




INOZ Wanol996 



T4 is a Family B enzyme from bacteriophaae T4 . 

The phage T4 DNA polymerase was the first DNA polymerase demonstrated to be essential for DNA replication in vivo. 
The purified DNA polymerase, like all DNA polymerases, replicates DNA in the 5' to 3' direction; the T4 DNA polymerase 
also has a 3' to 5' exonuclease proofreading activity that increases the fidelity of DNA replication. The T4 DNA 
polymerase replication complex was one of the first to be reconstituted in vitro. Structures of the T4-related RBe9 DNA 
polymerase are useful models for family B DNA polymerases. 



Selected Properties for T4: 



3'-5' exo 
activity 

Yes f9> 



5'-3' exo 
activity 

No (41 



Framesliift Substitution General 



Rate 

3.0e-07- 

1.2e-06 

errors/bp 

m. 



Rate 



Error Rate 



l.Oe-08- 2.0e-08 
2.0e-04 errors/bp 
errors/bp (81 fll 



References (312) ^ Structures (1) ^ Sequences (2) 



Search this table: 



Name 



3'-5' 5'-3' 
exo T exo 
activity activity 



Mutants (213)... | External links 

help with this table 



Framesliift Rate Substitution Rate 



General 
Error Rate 









1.3e-01-2.2e + 01 




l.Oe + 00 








Mutation 


2.0e-03-9.0e-01 


Mutation 


T4 A737V 


Yes (11 




freouencv 
(relative to WT1 

m 


Mutation freouencv 
(relative to WT1 (51 


freouencv 
(relative to 
WT) (1) 



Figure 2. Example polymerase page. See text for description. 



An Author Page (Figure 3) presents an author's publi- 
cation history in graphical and tabular form. It also 
displays a bar graph indicating the number of an 
author's publications on each polymerase. 

Author pages include all relevant publications by that 
author, and exclude publications from other authors with 
similar names. This simple requirement is complicated by 
the fact that authors may publish under multiple names, 
or share the same name. The identification problem is 
compounded by the many forms an author's name may 
take on (e.g. Lehman, I.R. Lehman I, L Robert Lehman, 
etc.). Polbase uses authors' last names and first initials to 
construct hsts of potentially matching authors. Matching 
authors are selectively merged using an iterative disam- 
biguation algorithm that considers co-authorship, pubh- 
cation years and topics. Authors are manually split or 
merged in case of erroneous mergers or missed matches. 



Search features 

Polbase features a text search tool at the top of each page 
that provides a simple interface to the search index. In 
addition to the typically indexed publication fields (title, 
abstract, author names etc.), Polbase also maintains cor- 
related indexes of authors, organisms, polymerase-specific 
properties and polymerases. A search for 'T4' produces the 
expected T4 polymerase record. In addition, correlated 
indices allow this search to return relevant authors, 
journal articles and the host organism, even when those 
items do not directly include the search term. 

An advanced search tool allows a user to find all poly- 
merases or references containing information about a 
specific topic, or search based on organism, family, etc 
(Figure 4). Because Polbase search is focused exclusively 
on DNA polymerases, search results are more relevant 
than comparable searches at PubMed, UniProt or PDB. 
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^earcn raioase vj 



Kornberg A 



References by Publication Year 




1980 1990 
Publication Year 



Polymerases 



nil 



10 



Polymerases by # of References 



20 




30 40 50 

# of References 



80 



■ Eco Pol iir 


■ T2 ■ 


Eco Pol 1 - Small Fragment 


■ Eco Pol 1 Exonuclease II ■ Bsu 


Eco Pol IV 


■ Eco Pol II 


■ Eco Pol V 


■ Klenow fragment ■ 


Eco Pol III* Mt^ ■ Eco Pol 1* 


■ Eco Pol III 


■ Eco Pol 1 











Publications: 

Searcin this table: 



help with this table 



Title 


Authors 


Year 


PrereoMcative comolexes of comoonents of DNA 


Fradlcin LG, 


1992 


Dolvmerase III holoenzvme of Escherichia coli. 


Kornberg A 







The Journal of 



chemistry 



Figure 3. Example author page. See text for description. 



IS 



Automated notifications 

Users may 'watch' specific polymerases to receive updates 
at their chosen frequency. Likewise, searches may be saved 
and updates sent when results change or new results be- 
come available. These features can be managed using 
features on the account page (Figure IG). 



encouraged to contribute any missing polymerase-related 
pubhcations. Polbase is able to accept entire reference 
hbraries in most popular formats, instantly categorizing 
their contents according to a shared technique or polymer- 
ase and skipping any pre-existing references in the 
collection. 



FUTURE WORK 

Polbase expansion is user directed. There is a simple mech- 
anism for users to suggest and vote on the priority of new 
features (Figure IJ). 

With help from collaborators, Polbase's coverage of 
enzymes and specific polymerase activities continues to 
expand. Polbase continues to add sophistication to its 
automated reference discovery features. It currently uses 
a two-phase algorithm to identify papers for inclusion. 
First, a strict pass identifies papers that should certainly 
be included, skipping any ambiguous papers. The second 
pass uses frequently seen author names and permits inclu- 
sion of additional papers without incurring a high false 
positive rate. Polbase reference discovery features will 
continue to be improved, however it is unrealistic to 
expect that all relevant papers will be identified automat- 
ically. To assist with this process, polymerase experts are 



CONCLUSION 

Polbase is the first open, on-line catalog of DNA polymer- 
ase information, it currently contains 183 wild-type 
and ~700 mutant polymerases in all 7 famihes spanning 
102 organisms. Over 7300 references are currently 
indexed, covering 488 structures and more than 2900 
discrete results and is constantly expanding. This open 
database provides a flexible, transparent resource to the 
diverse scientists who study and use polymerases in their 
work. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Table 1. 
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Search Polbase 



Limit results to... 

Specific result categories 

^Wild Type Pols OR [^References 
□ Mutant Pols 

Polymerase: 

Ospecific Polymerase ®Any Polymerase 
Families: 

©Specific Family OAny FarriM; 
55 A ^ B ^ C □ DJ^RT 0 X 



l2lAuthors 




Organisms: 

Tonir«! ©Specific Topic ©Any Topic 




bypass. 

ig WW, Woodgate R, DNA Repair (Amsc). 2002, 1;343 
id lesion-bypass DNA polymerases. 
■obiot, 2006, 60:231 

catalyzed by a Y-family DNA polymerase. 

■ f biological chemistry, 2007, 282:8188 
■ ■■■ ubion-bypass DNA polym 
ifov NE, Palel DJ, Trends 
an ONA polymerase eta. 



Rechkoblit O, Taylor JS, Geacintov NE, Wang Z, I 



ences. 2008, 33:209 

Ids research, 2000, 2B;4717 



Results for property: Template lesions 



Search this table: 



Polymerase^ 


Kingdom 


Family 


Reference 


Result 


Context 












Reaction: Nucleotide 


Ath DOl eta 


Eukaryote 


Y 


Anderson2008 


Bypasses 


incorporation; Substrate: n/a; 
DNA lesion: TT Cyclobutane 
Pyrimidine DImer 












Reaction: Nucleotide 


Dpo4 


Archaeon 


Y 


BoudsocQ2001 


Bypasses 


incorporation; Substrate: dATP; 
DNA lesion: 

Apurinic/Apyrimldinic (AP) site 












Reaction: Nucleotide 


Ddo4 


Archaeon 


Y 


Boudsoca2001 


Bypasses 


incorporation; Substrate: dATP; 
DNA lesion: TT Cyclobutane 
Pyrimidine DImer 








Perlow- 
Poehnelt2007 




Reaction: Nucleotide 


Dpo4 


Archaeon 


Y 


Bypasses 


incorporation; Substrate: dNTPs; 
DNA lesion: 8-oxo-dG 





showing 1 to 36 of 36 entries 



Figure 4. Polbase search features. Users can search Polbase for specific DNA polymerases, DNA polymerases in specific organisms, specific DNA 
polymerase families and a variety of DNA polymerase/DNA replication related topics. In this example, a user is searching for information about 
'lesion bypass' (A) and identifies specific search terms in the Result categories. Polymerase, Families and Organisms fields. In (B), the user selects the 
relevant topic 'Template lesions' to reveal the detailed results displayed in (C). 



AVAILABILITY 

Polbase is a free, open, non-commercial resource, and its 
contents may be included (with attribution) in other 
software. Almost all resources are available in both 
human-readable HTML renderings as well as XML and 
JSON encodings for consumption by other software. 
URLs have been designed to be predictable and accessible. 
To request a document in XML format, simply append 
'.xinl' to the URL (e.g. for all polymerases, /polymerases 
.xml). 
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